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Flexible power leducticxn for embedded components 



TECHNICAL FIELD 

Data processing system, method for processing data. 

BACKGROUND ART 

S Programmable platforms may include coniponents such as a central piocessiiig 

unit (CPU), one or more coprocessors, and a shared bus that connects the various processors. 
In media processing applications, the processing of the ftmctions is distributed to the central 
processing unit and the coprocessors. Such functions may be defined in hardware, in 
software, or in a mixture thereof. This choice nmy depend, amongst others, on the function 

1 0 itself the manufecturing volume of the function, and Ae circuit in questicm. The CPU is 
software controlled and can be sdspted to many different desired purposes by the use of 
suitable software, providing a great flexibility. A coprocessor is dedicated to execute a 
specific function. In general, for a given function, a software-controlled processor is usually 
less efficient in silicon area and power consunq)tion than a coprocessor dedicated to that 

15 function, but on the other hand a software-controlled processor is more flexible. The CPU 
may also act as a controller for the platform. 

The media processing may include video, gr^hics or audio processing. The 
utilization of each coprocessor may vary both for different applications as well during 
execution of a single s^plication, depending on the character of the media processing 

20 ^{plication or the mode of operation for certain use cases. As a result, one or more 

coprocessors may not be efifectively utilized during a certain part of the media processing. In 
case of a synchronous system those coprocessors continue consuming power, since they still 
receive a clock signal. In order to reduce the power consuniption of synchronous 
programmable platforms, the clock firequency of the platform can be lowered, according to 

25 the coprocessor with the highest utilization. Another approach is to lower the supply voltage 
of the platform. Unused coprocessors can also be powered down statically. However, in all 
these cases a substantial amount of the coprocessors will still provide more processing 
capacity than required at a specific moment and therefore also consume more power than 
required. 
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DISCLOSURE OF INVENTION 

It is an object of the mvention to provide a data processing system having a 
distributed power control, allowing to dynamically power down an individual conxponent 

This object is achieved with a data processing system, comprismg a plurality 
of processing elements, which are arranged for synchronously processing data under control 
of at least one clock fedlity. The data processing system further comprises at least one local 
controller associated with a proc^sing element of the plurality of processing elements, and a 
data communication means arranged for exchanging data between processing elements of the 
plurality of processing elements, wherein the local controller is arranged for powering down 
its associated processing element depending on the required processing capacity of that 
processing element. Depending on the workload of a coprocessor, the local controller powers 
down tiie coprocessor, allowing a dynamic power control. Since each coprocessor may have 
a local controller, the power management is distributed over the processing system, i.e. a 
global control mechanism for power management is not required. Such a global control 
mechanism introduces a substantial amount of overhead, especially in case of data processing 
system with a relatively large number of processing elements, and the difference in use-cases 
may coxr^licate this further. The power control of an individual coprocessor is transparent to 
the rest of the processing system, meaning that the other coprocessors have no need to know 
about the current power status of that specific coprocessor. At any time, iF required, any 
processing element or a combination of processing elements will become available 
automatically. Powering down of a processing element includes both completely switching 
off power for the processing element as well as putting the processing element in a sleep 
mode. 

US2002/0007463A1 describes a computer system comprising a number of 
units that operates as servers. Each unit has at least one processor and an activity monitor that 
identifies the level of activity for the processor. Each unit is operable in three different 
modes, haviug mutually different power consunqption rates. A controller is coupled to the 
units of the computer system and receives information on the level of activity from each unit 
The controller analyses this infinmation and determines an operating mode for each unit 
Subsequently, the controller generates commands to each unit for directing that unit to 
operate in the del^nmned operating mode. However, this document does not disclose a 
distributed power management system without the need of a global control mechanism. 
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US2003/0025689A1 describes a power management method for an electronic 
device, such as a computer system. The method comprises several power conservation 
techniques, including static powa controls, dynamic power controls and a flexible clock 
generator that may include one or more different programmable clock poHcies with 
programmable clock rates. The static power control is used for powering down any unused 
functional modules at different times. The dynamic power control utiHzes Ihe clocking 
mechanism to reduce power consumption of the complete system. Using the flexible clock 
generator the ^ropriate clock speed is set to provide just enough clock speed for the 
particular task at hand. It does not disclose, however, how to (fynamically power down one or 
more hardware units separately. 

An embodiment of the invention is characterized in that the data processing 
system furflier con5)rises at least one buffer associated with Ihe processing element of the 
plmalily of processing elements, wherein the buffer is arranged for exchanging data between 
its associated processmg element and Ihe data communication means, and wherein the tocal 
controUer is arranged to determine the required processmg c^ty of its associated 
piocessmg element ftom Ihe filling degree of the associated buffer. Using the filling degree 
of the associated buffer is a relatively shnple way of determming the workload of the 
associated processing element In case the buffer is empty, the local controller powers-down 
the processmg element As soon as the buffer is at least partially fiUed again, the local 
controller powers up Ihe processing element 

An embodiment of the invention is characterized in that the data processing 
system further comprises a control processor, wherein the local controUer is arranged to 
receive information on the required processing capacity of the associated processing element 
from Ihe control processor, and wherein the local controller is furflier arranged to have 
information on the processing edacity of the associated processing element Usmg the 

mformation, the local controUer determines the time interval fliat the corTeq)onding 
processing element is idle, and powers down flie processmg element, depending on the length 
of this time interval Once the processmg element receives new data to process, the local 
controUer powers up the correspondmg processmg element 

An embodhnent of fee invention is characterized m that the processing 
element of the pluraUly of processing elements is further arranged to generate aD interrupt for 
notifying its associated local controUer on the required processing edacity. In case the 
processmg element has finished processing data, it notifies its correspondmg bcal controUer. 
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Subsequently, flie local controller powers down the processing element At the moment new 
data for processing arrive, flie processing element is powered up again. 

An embodiment of flie invention is characterized in that a sequence of clock 
cycles effects a processing operation of an amount of data, wherein Ihe data processing 
5 system further comprises programmable means for implementing piogrammaible stall clock 
cycles for the processing element of liie plurality of processing elements, wherein tiie 
programmable siall clock cycles are interspersed between clock cycles of the sequence of 
clock cycles. In case blocks of data are offered on regular times, it may be flie case that Ihe 
processing of a block of data has already finished before the next block of data has arrived. 
10 Programming of stall cycles between the clock cycles for processing of data can be used in 
order to reduce the peak load of bandwidth consumption of a coprocessor. On the other hand, 
the remaining time can be used to power down Hxe coprocessor for reasons of power savings. 
An advantage of this embodiment is Aat it allows e3q)loiting the trade off between spreading 
the bandwidth consumption and power savings, and making an optimization depending on 

15 tiie requirements of the system. 

An embodiment of ihe mvention is characterized in that at least one procesang 
element is associated witii a bandwidtii control unit fiwr controlling a rate of its data transfer 
along the data communication means, the bandwidth control unit restricting the data transfer 
if it exceeds an allowed maximum data rate. In case blocks of data are ofifered for processmg 

20 on regular times, it may be tiie case that the processing of a block of data has aheady finished 
before tiie next block of data has arrived. The bandwidtii control unit can adapt tiie 
consumption of bandwidth by a processing element to a level tiiat is suitable for tiie fimction 
actually performed. The bandwidth consumption can be averaged over flie time interval 
between the arrivals of two data blocks. Alternatively, tiie remaining time can be used to 

25 power down the coprocessor. As m case of a previous embodiment, an optimization between 
spreading the bandwidth consumption and power savmgs can be made, dependmg on the 

system requirements. 

Further embodraients of the invention are described m the dependent claims. 
Accordmg to the invention, a mefliod for processmg data accordmg to claim 9 
30 is provided as well. 

BRIEF DESOEOPTION OF FIGURES 

Figure 1 shows an enibodunent of a data processing system accordmg to the 

present invention. 
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Figure 2 shows another einbodiinent of a data processing system accoiding to 
the present invention. 

Figure 3 shows an embodiment of a bandwidth control unit. 

DESCRIPTION OF EMBODIMENTS 

Figure 1 and Figure 2 illustrate embodiments of a data processing system 
according to the present invention. Referring to both Figure 1 and 2, the data processing 
system comprises a system bus SB, a shared memory MEM, an input unit lU. an output unit 
OU, a central processing unit CPU, coprocessors COPl and COP2, bus interns BIl and 
BI2, and local controllers CTRl and CTE12. The dataprocessmg system also conquises a 
system clock, not shown in Figure 1 and 2, for sendmg clock signals to all components of the 
system. In alternative embodiments, the data processing system may have a phirality of 
clocks for operation of different components of the system at a different clock speed. The 

system bus SB and flie memory MEM are shared iQT the central processing unit CPU, m^ 
unit lU, output unit OU and coprocessors COPl and COP2. The data processing system 
executes media processing plications, jfor example in the field of video, graphics or audio 
processing. The central processing unit CPU controls the overall system. Next to controlling 
the memory MEM, the central processmg unit CPU may immediately access various control 
registers m the coprocessors COPl and COP2. The central processing unit CPU may also 
execute a software program containing parts of the fimctionaUty of the media processing 
appHcation. The coprocessors COPl aad COPl are dedicated for executing specific media 
processing functions in hardware, and these functions of tiie media processing pKcation are 
mped onto the coprocessors COPl and COP2. For example, in case of an MPEG 
plication, functions representing a Discrete Cosine Transfomi (DCT) function or a motion 
estimation function, can be mapped onto coprocessors COPl and COP2 respectively, which 
are dedicated to execute tiiese specific fimctions. Iiqnit data, such as speech or image inputs is 
received via the input unit lU and are subsequently processed by central processmg unit CPU 
and coprocessors COPl and COP2. The ou^ data are written to the output unit OU, which 
outputs the data to another data processing system, or to a display device, to name a few. In 
some embodiments, the rapat unit lU receives input data at regular time intervals. In other 
embodiments, the input unit lU receives bursts of input data, depending on the media 
pUcation or the source of input data, to name a few. In some embodiments, tiie ou^ut unit 
OU may output data at regular time intervals. In different embodiments the output unit OU 
ou^uts data in bursts. Intermediate results obtained during the data processing can be stored 
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by the coprocessors COPl and COP2 or the central processing unit CPU in the memory 
IOEMl, via the system bus SB, and subsequently retrieved from the memory MEM for fiirther 
processing. Since various ones of the coprocessors COPl and COP2, input unit lU, ou^ut 
unit lO, and central processing unit CPU can initialize transfer of data via the sj^tem bus SB 
5 independent of the others an aftntration mechanism is necessary to sequentialize the bus 
transfers, and in the case shown, for controlling memory accesses. For this purpose a bus 
arbiter, not shown in Figure 1 and 2, can be used The coprocessors COPl and COP2 
communicate with the system bus SB via bus interfece BIl and BI2, respectively. These bus 
interfeces BIl and BI2 comprise an input buffer for buffering data that has to be transferred 
10 from the system bus SB to tiie coprocessor, and an output buffer for buffering data that has to 
be transferred from the coprocessor to the system bus SB. In alternative embodiments, two 
separate bus interfeces can be used for a coprocessor, conqmsing an input buffer and an 
output buffer, respectively. In yet another embodiment, a coprocessor may have multiple bus 
interfeces for receiving input data and/or multiple bus interfeces for outputting data, for 
1 5 example fat transferring data related to different images via different bus interfeces. The 

input and ou^ut buffers allow flie system bus SB to wodc independently of the coprocessors 
COPl and COP2. The local controllers CTRl and CTR2 can power down the coprocessors 
COPl and COP2, respectively, depending on the workload of those cqprocessors, as will be 
explained in the next paragre^hs. The coprocessors COPl and COP2 can be ixqplemented by, 
20 for example, dedicated hardware, a programmable processor loaded wifli software to execute 
a dedicated function, for example a Very Large Instruction Word processor, or reconfigurable 
hardware, for example a Field ProgramnMble Gate Array. 

In different embodiments, the data processing system may have more than two 
coprocessors, or a different number of CPUs, or a different number of memory units, 
25 depending, for example, on the type of media processing application for which the data 
processing system is designed. Alternatively, the input unit lU and ou^ut unit OU can be 
integrated in a coprocessor. 

Referring now to Figure 1, local controller CTRl is coiqpled to bus mterfece 
BIl and local controller CTR2 is coiq>led to bus interfece BI2. During data processing, inpat 
30 data are transferred to the input buffers of the bus interfeces BIl and BI2. The data 

processing may include streaming procesdng, Le. processing of video fields or fiames, slices 
of data, to name a few, wifliin regular processing periods. The coprocessors COPl and COP2 
read these data from the corresponding input buffer of bus interfeces BIl and BI2, process 
the data and write the result data to the corresponding output buffers of the bus interfeces BIl 
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and BI2. Via the system bus SB the result data aie written to memoiy MEM, or to the outpat 
unit OU. The system bus SB is a shared resource, and during data processing the situation 
may occur that coprocessor COPl initializes a request to retrieve data fiom memory MEM 
via the system bus SB, while at fliat moment a series of bus requests by other components of 
the data processing system is stUl pending. The bus request of coprocessor COPl is added to 
the queue of bus requests, while coprocessor COPl continues processing data that are stored 
in the input buffer of BIl. At the moment that input buffer is empty, the coprocessor COPl is 
stalled by the bus interfece BIl. The local controller CTRl detects that the corresponding 
input buffer is empty, and powers down the coprocessor COPl . As soon as the bus request 
initialized by coprocessor COPl is handled, data are written fiom memory MEM to the input 
buffer of bus interfece BIl. The local controller CTRl detects that the input bu^ of bus 
interfece BIl contains data, and powers up the coprocessor COPl, which continues 
processing data fiom the corresponding iiiput buffer. As a result, a dynamic, distributed 
power control is obtained, dq>ending only on the amount of data that a coprocessor has to 
process. Furthermore, the local controller only requires relatively single hardware. In an 
alternative embodiment, the processing element is powered up only afler a certain amount of 
data is present in the corresponding input buffer. In some embodiments, the input unit lU 
and/or tiie ou^ut unit OU may also have a local contix>ller, which powers down tiie 
corresponding unit in case no data are received or ou^ut, respectively, for example m case 
the transfer of data goes via bursts. 

Referring to Figure 2, local controller CTRl is coupled to bus interface BIl, 
local controller CTR2 is coi^led to bus interfece BI2. and tiie local controUers CTRl and 
CTR2 are botii coupled to tiie system bus SB. During stireaming processing, tiie central 
processing unit CPU activates flie coprocessors COPl and COP2 to start processing data by 
writing information in the control registers of tiie coprocessors. This mformation may 
inchide: inemory addresses of tiie memory MEM, height and widfli of a video fiame to be 
processed and flie number of frames per second flat have to be processed by fliat 
coprocessor. The heig|it and widfli of a video fiame relate to flie amount of data tiiat has to be 
processed for one video fiame. At flie moment flie coprocessw COPl or COP2 has finished 
processing data fin- a given video fiame, flie coprocessor generates an mterrupt to notify flie 
central processing unit CPU. In an embodiment of flie present invention, tiie coprocessors 
COPl and COP2 also sent an intemq)t to flie corresponding local controUer CTRl and 
CTR2, which subsequentiy power down the coprocessor COPl and COP2, respectively. In 
anoflier enabodiment, tiie local controllers CTRl and CTR2 have registers to store 
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infbimation on liie number of fimaes per second tiiat the corresponding coprocessor has to 
process. This information can be stored in the registers of coprocessors COPl and COP2 by 
the central processing unit C3»U. Using this informatiQn, Ifae local controllers CTRl and 
CTR2 calculate the time interval between the receipts of two video frames. At the moment 
the coprocessors COPl and COP2 start proc^sing a series of video fiames, the 
concesponding local controller starts an internal timer. When the coprocessors COPl and 
COP2 finish processing a video £came, an interrupt is sent to local controllers CTRl and 
CTR2 respectively. The local controllers CTRl and CrR2 determine tiie time interval 
between the receipt of tite interrupt and the start of the processing of a next video frame. 
Depending on the length of lhat time interval, the local controllers CTRl and CTR2 power 
down the corresponding coprocessor COPl and COP2. Powering down and up within regular 
processing periods has its limits, because the operation to power down and to power up a 
coprocessor consumes power as welL The local controllers CTRl and CTR2 can have a 
programmable register, for example, for storing a minimum value for tiie time interval 
between receipt of the internet and start of tiie processing of a next fisme. Only in case the 
actual time interval is equal to or larger than this mtnirninn value, flie local controllos CTRl 
and CTR2 power down the corresponding coprocessor. At the moment the processing of a 
next video fiame should start, tiie local controllers CTRl and CTR2 power vjp tiie 
coprocessors COPl and COP2, re^ectively. In an alternative embodiment, the coprocessors 
COPl and COP2 are powered up by the central processing unit CPU, when it requests for 
processing a next block of data. 

Jn another embodiment of the invention, Ihe central {Mrocessmg unit CPU can 
be further programmed to implement stall cycles for coprocessors COPl and COP2, 
interspersed between clock cycles of the sequence of clock cycles used for processing of data 
by tihe coprocessors. During a stall cycle the coprocessors COPl and COPl still receive a 
dock signal, but do not respond due to stall cycles generated by their corresponding local 
controller. The usage of stall cycles for lowering tiie actual date transfer rate is further 
described in United States Copending ^application Serial nr. 09/920 042 (Attorney Docket 
PHNL010506), also assigned to the present assignee, herein incorporated by reference. In 
distributed data processing data may be presented to or may be required from the system bus 
SB on short notice and/or in high-intensity bursts. When such transfers would occur within 
short time frames, overall system bus capadty would readily and frequentiy be exceeded, 
which would then lead t» a stall situation for the component requesting the transfer. The stall 
cycles can be used to Iowct the actual transfer rate of data via tiie system bus SB, since y/bea 
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a coprocessor executes one or more stall cycles no bus requests are made by tiiat coprocessor. 
An advantage of this embodiment is diat it allows fte trade-off between reducing tibe power 
consumption of a coprocessor and spreading the consumption of baadwidfli of the system bus 
SB in time. In case tiie actual processing time of a coprocessor for a given set of data, for 
example a video fiame, is less Ibat the time interval between two video fiames, this time 
difference can be used for spreading die bandwidth consunq>tion by adding programmable 
staU cycles in between the normal processing cycles, or to power down the coprocessor 
during a period of time for each time interval between two video frames, as describes in a 
previous embodiment Depending on tiie media processing appKcation, flie configuration of 
the data processing system and tiie system requirements, an optimization between spreading 
tiie bandwidtii consumption and reducing the power consunqrtion can be made. 

Referring again to Figure 2, m yet another embodiment tiie local controUers 

CTRl andCTR2 further con^mseaso-called bandwidtii control unit The usage ofaband- 
width control unit for lowering tiie actual data transfer rate is forflier described m United 
States Copendmg >^Kcation (Attorney Docket PHNL030795), also assigned to tiie present 

assignee, herem mcQiporated by reference. Using tiiese boQdwidfli control units, flie 
consumption of bandwidtii by coprocessors COPl and COP2 can be controlled by the 
corresponding local conttoller CTRl and CTR2, fliereby effectively slowing down tiie 
average data processmg speed of tiie coprocessors CX>P1 and COP2, respectively. However, 
if necessary, additional transfer c^iUty can be provided, so tiiat m most cases no longer a 
stall situation would prevail. Bus arbitration, for example by means of a bus arbiter, is stiH 
necessary, smce tiie coprocessors COPl and COP2 can still initiate bus transfers 
shnultaneously. The local controllers CTRl and CTR2 forther have registers to store 
information on tiie height and widfli of a video frame, tiie number of frames per second tiiat 
tiie correspondmg coprocessor has to process and tiie compute c^ty of flie corresponding 
coprocessor. This mfoimation can be storedin tiie registers by tiie central processing unit 
CPU. Using fliis information, tiie local contiwllers CTRl andCTR2 calculate tiie mmimum 

time tiiat is required by flie correspondiftg coprocessor to process tiie data for one video 
fiame, tiie time interval between tiie receipt of two video fiames, and tiie allowed maxmmm 
data rate for bandwidtii consumption. The allowed maximum data rate is based on tiie height 
and widfli of a video fi»me and a chosen time mterval, which is at most tiie time interval 
between two video frames. The bandwidtii contiol units restrict tiie average bandwidtii 
consumption of tiie corresponding coprocessor COPl and COP2 to tiieir allowed maximum 
data rate. In case flie coprocessors COPl and COP2 have less bandwidtii available tiian tiieir 



PHNL030941EPP 



10 24.07.2003 
own quoted bandwidlh in a certain period during processing of a video frame, they can in 
principle catch iq> for the discrepancy in a subsequent time period, before iiie receipt of the 
next video fcame. In a particular advantageous embodiment such catch-up time is provided in 
a brief so-called sladc time that is situated at the end of iJie time interval between two video 
5 fiames and for which the maximum system bus bandwidth has been specified- At the moment 
the coprocessors COPl and COP2 start processing a series of video ftames, the 
corresponding local conlioller starts an internal timer. When the coprocessors COPl and 
COP2 finish processing a video frame, an interrupt is sent to local controllers CTRl and 
CTR2 respectively. The local controllers CTRl and CTRl determine the time period between 
10 the receipt of iJie interrapt and Ihe start of the processing of a next video firame. Depending 
on the length of this time interval, the local controllers CTRl and CTR2 may power down tiie 
corresponding coprocessor COPl or COP2. The local conti-oUers CTRl and Cm2 can have a 
programmable register, for example, for storing a minimum value for tiie time interval 
between recent of the intemipt and start of the processing of a next firame. Only in case the 
15 actual time interval is equal to or larger than tiiis minimum value, the local controllers CTRl 
and CrR2 power down the corresponding coprocessor. At the moment the processing of a 
next video frame should starts the local controllers CTRl and CTR2 power up tiie 
coprocessors COPl and COP2, respectively. An advantage of tiiis embodiment is tiiat it 
allows the trade-offbetween reducing the power consumption of a coprocessor and spreadmg 
20 the consumption of bandwidfli of flie system bus SB in time. The time mterval for calculating 
the allowed trunrimntn data rate of a coprocessor can be chosen equal to the time interval 
between two video frames, and in this case the bandwidth consumption of tiiat coprocessor is 
maximally spread. On the other hand, &e time interval for calculating tiie allowed maximum 
data rate can be chosen equal to the minimum time required for processing the video fiiame, 
25 allowing the coprocessor to be powered down during the remainder of flie time interval 

between two video frames and maximizing the reduction in power consumption. Depaiding 
on the media processing application, the configuration of the data processing system and the 
system requirements, an optimization between spreading the bandwidth consumption and 
reducing the power consumption can be made. 
30 Figure 3 shows an embodiment of a control unit CTR comprising a bandwidth 

control unit BCTR, as well as a coprocessor COP coupled via a bus interfece BI to a system 
bus SB. The bandwidtii control unit comprises an average calculation unit AV to calculate an 
average amount of data Sta tiansferred via the bus interfece BI to tiie system bus. To that end 
the average calculation unit receives a signal St indicative for the amount of data transfer 
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taking place via the bus interfece BI. The bandwidth control unit BCTR further comprises a 
register UM for storing an indication for the allowed maximum data rate Stl. A comparator 
CMP conq)ares these signals and controls a gate G with control signal CT. Normally the gate 
G transmits a bus request BRI from the bus interface BI as the signal BRO to a bus arbiter, 
5 and the bus arbiter will respond with an acknowledge signal ACK if the bus is available. 
However, if the average amount of data Sta transferred via the bus interfece BI to Ihe system 
bus exceeds Hie allowed maximum data rate Stl, the control signal CT causes the gate G to 
block Uie bus request signal BRL In that case no request BRO is received by Hie arbiter, and 
further data transmission is prevented untU the average vahie Sta has decreased to a value 

10 below the allowed value Stl. On the other hand, if it occurs that the system bus SB has not 
been available for some time, because another device, for example a CPU havmg a high 
priority has occupied the bus, the average amount of data Sta transferred is substantially 
lower than the allowed value Stl. In that case the coprocessor COP has the occasion to 
temporarily increase data transfer until the average vahie Sta again reaches the aUowed value 

15 Stl. 

It should be noted that the above-mentioned embodunents illustrate rather than 
Ihnit the mvention, and tiiat tiiose skilled m the art will be able to design many alternative 
embodunents without departing from the scope of flie appended claims. In the claims, any 
reference signs placed between parentheses shaU not be construed as limiting the claim. The 

20 word "comprismg" does not exclude the presence of elements or steps other lhan those Usted 
in a claim. The word "a" or "an" preceding an element does not exclude the presence of a 
pluraHty of such elements. The invention can be implemented by means of hardware 
comprising several distinct elements, and by means of a suitably programmed computer. In 
the device claim enumerating several means, several of tiiese means can be embodied by one 

25 and the same item of hardware. The mere feet tiiat certain measures are recited m mutuaUy 
different dependent clahns does not uidicate that a combination of tiiese measures cannot be 
used to advantage. 
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CLAIMS: 



1 • A data processing system, comprising: 

a plurality of processing elements (COPl, COP2), which are arranged fiw 
synchronously processing data, under control of at least one clock ftcility; 

at least one local controller (CTRl, CrR2) associated with a processing 
element of the pluralily of processing elements; 

a data communication means (SB) arranged for exchanging data between 
processing elements of the plurality of processing elements, 

wherein the local controller is arranged for powering down its associated processing element 
depending on the required processing capacity of that processing element 

2- A data processing system according to claim 1, wherein the local controller is 

further arranged for powering iip its associated processing element depending on the required 
processing capacity of that processing element 

3. A data processmg system according to claim 1, further con5»rising: 

at least one buffer (BIl, BE) associated with the processing element of the 
pluraHly of processing elements, wherein the buffer is arranged fiw exchanging data between 
its associated processing element and the data communicatian means, 
and wherein the local controller is arranged to determine the required processmg capacity of 
its associated processmg element from the filling degree of the associated buffer. 

4. A data processing system according to claim 1, further comprising a control 
processor, 

wherein the local controller is arranged to receive information on the required processing 
capacity of the associated processing element from the control processor, 
and wherein the local controller is further arranged to have information on the processmg 
Ci^acity of the associated processing element 
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5, A data processing system according to claim 1 , wherein the processing 

element of the plurality of processmg elements is further arranged to generate an interrupt for 
notifying its associated local controller on the required processing edacity. 

5 6. A data processing system according to claim 1 , wherem a sequence of clock 

cycles effects a processing operation of an amount of data, 

wherein the data processing system further comprises programmable means for inoplementing 
programmable stall clock cycles for the processing element of Ihe plurality of processing 
elements, wherein the programmable stall clock cycles are interspersed between clock cycles 
10 of the sequence of clock cycles. 

7, A data processing system according to claim 1, wherein at least one processing 
element is associated with a bandwidth control unit (BCTR) for controlling a rate of its data 
transfer along the data communication means, the bandwidth control unit restricting the data 

15 transfer if it exceeds an allowed inaxiinum data rate. 

8, A data processing system according to claim 1 , further comprising a memory 
feciUtyCMEM), 

wheiem the data communication means is forflier arranged for exchangmg data between the 
20 memory facility and the processing elements of the plurality of processmg elements. 

9, A method for processing data, using a data processing system, comprising: 
a plurality of processing elements (COPl, COP2), which are arranged for 

synchronously processing data under control of at least one clock facility; 
25 - at least one local controller (CTRl, CTR2) associated with a processing 

element of the plurality of processing elem^ts; 

a data coimnunication means (SB) arranged for exchanging data between 
processing elements of the plurality of processing elements, 
wherein the method comprises the following steps: 
30 - supplying data to the processing element; 

powering down of tiie processing element by the local controller if no data are 
available for processing by the processing element; 
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10. A method fca: processing data according to claim 9, wherein the me&od 

further comprises the following step: 

powering up of the processing element by the local controller if data are 
availaible for processing by the processing element 
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ABSTRACT: 



Programmable platfomis include components such as a central processing unit 
(CPU), coprocessors (COPl, COP2), and a shared system bus (SB) that connects the various 
processors. In media processing ^Hcations, the processing of Ihe functions is distributed to 
the central processing unit and the cqpiocessars. Such functions may be effected in hardware, 
m software, or in a mixture thereof The utilization of each coprocessor may vary botti for 
different appHcations as well during execution of a single ^Ucation, depending on flie 
character of the media processmg ^Hcation. As aresutt, one or more coprocessors may not 
be effectively utilized during a certain part of the media processmg. In case of a synchronous 
system those coprocessors continue consuming power. Accordmg to the invention, a 
coprocessor can be powered down by a local controller, depending on tiie workload of that 
coprocessor. As a result, power control is distributed and automatic, and only depends on 
required processing capacity of the coprocessor. 
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